Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neural search default processor for non OpenAI/Cohere scenario #1274

Merged

Conversation

zane-neo
Copy link
Collaborator

@zane-neo zane-neo commented Sep 1, 2023

Description

ml-commons has two default pre/post process function which are for OpenAI and Cohere and written in painless script. There's no default pre/post process function for neural search plugin text embedding case, if user want to use neural search with remote model to text embedding, user has to write complex painless script which is a heavy burden. This PR do two things:

  1. Add default pre/post process function for neural search plugin text embedding and once user defines their model output to List<List> as suggested, then in the connector pre/post process can be defined to strings like: connector.pre_process.neural_search.text_embedding or connector.post_process.neural_search.text_embedding, which is pretty easy for user to use neural search text embedding.
  2. Change default OpenAI & Cohere painless script to java code to make them more readable to both coder and user.

Issues Resolved

#1304

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:26 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:26 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:26 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:47 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:47 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:47 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 00:47 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 19:36 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 19:36 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 19:36 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 19:36 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2023 19:36 — with GitHub Actions Inactive
@zane-neo zane-neo merged commit b48546a into opensearch-project:main Sep 26, 2023
6 of 8 checks passed
zane-neo added a commit that referenced this pull request Sep 26, 2023
* Add Auto Release Workflow (#1306)

* Add Auto Release Workflow

Signed-off-by: Sicheng Song <[email protected]>

* Fix release note address

Signed-off-by: Sicheng Song <[email protected]>

---------

Signed-off-by: Sicheng Song <[email protected]>

* Bump aws-encryption-sdk-java to fix CVE-2023-33201 (#1309)

Signed-off-by: Sicheng Song <[email protected]>

* Add release note for 2.10.0 release (#1312)

* Add release note for 2.10.0

Signed-off-by: Sicheng Song <[email protected]>

* Add CVE fix

Signed-off-by: Sicheng Song <[email protected]>

---------

Signed-off-by: Sicheng Song <[email protected]>

* fixing doc link (#1318)

* fixing doc link

Signed-off-by: Dhrubo Saha <[email protected]>

* fixing indentation

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Dhrubo Saha <[email protected]>

* Fix unassigned ml system shard replicas (#1315) (#1324)

* Fix unassigned ml system shard replicas



* Adjust auto replica settings to keep it consistent with AOS default setting



* Update plugin/src/main/java/org/opensearch/ml/indices/MLIndicesHandler.java




* Modify exception handling



* Modify exception messages



* Add response check



* Add response check and exception handling



* Keep error message consistent



* Keep error message consistent



* Keep error message consistent



---------

Signed-off-by: Sicheng Song <[email protected]>
Co-authored-by: Yaliang Wu <[email protected]>

* Adjust index replicas settings to keep consistent with AOS 2.9 (#1325)

Signed-off-by: Sicheng Song <[email protected]>

* Make 2.10 release notes up to date (#1345)

Signed-off-by: Sicheng Song <[email protected]>

* fix spelling (#1363)

Signed-off-by: Kalyan <[email protected]>

* Add neural search default processor for non OpenAI/Cohere scenario (#1274)

* Add neural search default pre/post process function support

Signed-off-by: zane-neo <[email protected]>

* Fix UT failures

Signed-off-by: zane-neo <[email protected]>

* Address PR comment to remove nonJson response case

Signed-off-by: zane-neo <[email protected]>

* Fix low code coverage issue

Signed-off-by: zane-neo <[email protected]>

* fix format issue

Signed-off-by: zane-neo <[email protected]>

* Try to fix classNotFound issue in IT

Signed-off-by: zane-neo <[email protected]>

* revert Try to fix classNotFound issue in IT

Signed-off-by: zane-neo <[email protected]>

* Change gson dependency to compileOnly

Signed-off-by: zane-neo <[email protected]>

* Change default pre/post process function name

Signed-off-by: zane-neo <[email protected]>

* Address code review comments

Signed-off-by: zane-neo <[email protected]>

* Make preprocess function to default

Signed-off-by: zane-neo <[email protected]>

* Remove GsonUtil since there already a single instance in StringUtils

Signed-off-by: zane-neo <[email protected]>

* Fix UT failures

Signed-off-by: zane-neo <[email protected]>

* Address comments

Signed-off-by: zane-neo <[email protected]>

* use import instead of fully qualified name

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: Sicheng Song <[email protected]>
Signed-off-by: Dhrubo Saha <[email protected]>
Signed-off-by: Kalyan <[email protected]>
Signed-off-by: zane-neo <[email protected]>
Co-authored-by: Sicheng Song <[email protected]>
Co-authored-by: Dhrubo Saha <[email protected]>
Co-authored-by: Yaliang Wu <[email protected]>
Co-authored-by: Kalyan <[email protected]>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1274-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b48546ae89a51f303ef23ac8d605e3e343b67b34
# Push it to GitHub
git push --set-upstream origin backport/backport-1274-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1274-to-2.x.

@zane-neo zane-neo mentioned this pull request Sep 27, 2023
5 tasks
ylwu-amzn pushed a commit to ylwu-amzn/ml-commons that referenced this pull request Oct 4, 2023
…pensearch-project#1274)

* Fix breaking change caused by opensearch core

Signed-off-by: zane-neo <[email protected]>

* Add neural search default pre/post process function support

Signed-off-by: zane-neo <[email protected]>

* Fix UT failures

Signed-off-by: zane-neo <[email protected]>

* Fix conflicts when backport

Signed-off-by: zane-neo <[email protected]>

* Fix conflict when backport

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>
ylwu-amzn pushed a commit that referenced this pull request Oct 4, 2023
…1274)

* Fix breaking change caused by opensearch core

Signed-off-by: zane-neo <[email protected]>

* Add neural search default pre/post process function support

Signed-off-by: zane-neo <[email protected]>

* Fix UT failures

Signed-off-by: zane-neo <[email protected]>

* Fix conflicts when backport

Signed-off-by: zane-neo <[email protected]>

* Fix conflict when backport

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants